I came across Buffon's Needle in the book Proofs from the Book by Martin Aigner and Günter M. Ziegler.
The following problem was posed in 1777 by the Comte de Buffon:
Suppose you drop a short needle on ruled paper - what is the probability that the needle comes to lie in a position where it crosses one of the lines?
The answer - surprisingly - involves $\large{\pi}$. And consequently provides a method to approximate the value of $\large{\pi}$ through Monte Carlo simulation.
A short needle means that the length of the needle $(\ell)$ must be less than the distance between the lines ($d$) - otherwise a needle can cross two lines and this changes the result to something much less clean. In this short needle case, the probability that a needle comes to rest crossing one of the lines is
$$p = \dfrac{2 \ell}{d \pi} \; .$$
Given this result, an approximate value for $\pi$ would be
$$\hat \pi \approx \dfrac{2\ell}{d} \dfrac{N}{n}$$
where $N$ is the number of trials and $n$ is the number of "hits" i.e. intersections.
Let's look at a single needle.
Each needle will have it's base (lower end) between two lines at distance $y$ from the lower line. (We can ignore the case where the base of the needle lies exactly on a line as this has probability zero.) And the needle will lie at angle $\alpha$ from the horizontal. The vertical height of the needle is therefore $\ell \sin \alpha$.
So for the needle to intersect the top line requires: $\; y + \ell \sin \alpha >= d$.
$y$ is distributed uniformly from $0$ to $d$ i.e. $y \sim U(0,d)$. And $\alpha$ is distributed uniformly from $0$ to $\pi$ i.e. $\alpha \sim U(0, \pi)$. The probability density value inside those ranges is just the value that makes the area under the probability density function equal to one. E.g. $1/d \times d = 1$.
What's more, these two distributions are independant. So every point in the $2D$ space within those ranges of $y$ and $\alpha$ are equally likely. And so the joint distribution is also uniform and can be pictured as a box (or rectangular cuboid).
$\hspace{3em}$ | $$y \sim U(0, d)$$ , $$\int_0^d \frac{1}{d} \; dy = 1 $$ | $\hspace{5em}$ | |
$$\alpha \sim U(0, \pi)$$ , $$ \int_0^\pi \frac{1}{\pi} \; d\alpha = 1$$ | |||
$$y,\alpha \sim U \big( (0, d),(0, \pi) \big)$$ , $$\int_0^\pi \int_0^d \frac{1}{d \pi} \; dy \; d \alpha = 1$$ |
The probability we want is the volume of the rectangular cuboid that contains the "hits" i.e. intersections. And recall that a "hit" required that $y + \ell \sin \alpha >= d$ or $y >= d - \ell \sin \alpha$.
So instead of integrating along $y$ from $0$ to $d$ we integrate from $d - \ell \sin \alpha$ to $d$.
i.e.
\begin{aligned} P(hit) &= \int_0^\pi \int_{d - \ell \sin \alpha}^d \frac{1}{d \pi} \,dy \,d\alpha \hspace{30em} \\[1em] &= \frac{1}{d \pi} \int_0^\pi \Big[y \Big]_{d - \ell \sin \alpha}^d \,d\alpha \\[1em] &= \frac{1}{d \pi} \int_0^\pi \ell \sin \alpha \,d\alpha \\[1em] &= \frac{\ell}{d \pi} \Big[- \cos \alpha \Big]_0^\pi \\[1em] &= \frac{2}{\pi} \frac{\ell}{d} \end{aligned}
Now let's test the theorem with a simulation using Julia.
The function rand()
uniformly picks a number from the range $[0, 1)$. So rand() * d
uniformly picks a number from the range $[0, d)$. We could also have used the Distributions package and call rand(Uniform(0, d))
.
Note that
d - rand() * d
is equivalent (just horizontally flipped which is okay as the Uniform distribution is symmetric) to
rand() * d
So the inner test of
rand() * d + ℓ * sin(rand() * π >= d
simplifies to
rand() * d <= ℓ * sin(rand() * π)
or
rand() <= ℓ / d * sin(rand() * π)
function π_est(N, ℓ, d)
n = 0
r = ℓ / d
for i in 1:N
rand() <= r * sin(rand() * π) && (n += 1)
end
return 2r * N/n
end
Note that each trial is done individually inside the for loop. It would have been an option to generate a vector of random numbers by calling, say for 1,000 trials, rand(1000)
. Some languages would recommend this technique but Julia is so quick at running simulations that this isn't necessary and, what's more, we can run many more simulations as we don't need memory - e.g. running rand(10_000_000)
(with 64-bit floating-point numbers) would require 80MB of memory but running the simulation in a loop uses (as you can see below) only 176 bytes of memory.
Let's run this 10 million times! - and then compare to the true $\large{\pi}$. We can estimate with any values of $\ell$ and $d$. Let's try $\ell = 1$ and $d = 2$ (recall $\ell < d$).
@time π_hat = π_est(10_000_000, 1, 2)
# π is a built in constant in Julia
π
println("absolute difference : $(round(π_hat - π, 5))")
println("percentage difference : $(round((π_hat - π) / π * 100, 3))%")
A pretty good approximation - which you'd hope for after that crazy number of needles.
The approximate value for $\pi$ from above is $\hat \pi \approx \dfrac{2\ell}{d} \dfrac{N}{n}$.
Calculating the variance of the estimator $\hat \pi$ is not straightforward as it involves the reciprocal of the Binomial distribution. But following the method in article "Sharpening Buffon's Needle" by Perlman and Wichura we can estimate the variance.
Start with the variance of the estimator $\dfrac{1}{\hat \pi} \approx \dfrac{d}{2 \ell N} n$ as follows:
\begin{aligned} n &\sim \mathrm{Binomial}(N, p) \hspace{36em} \\[1em] \mathrm{Var}(n) &= Np(1-p) \\[1em] \mathrm{Var}(\frac{1}{\hat \pi}) &= \frac{d^2}{4 \ell^2 N^2} Np(1-p) = \frac{d^2}{\ell^2} \frac{1}{4N} p(1-p) \end{aligned}
Then, using the delta method (see the wikipedia page and this page on Cross Validated), we can approximate the asymptotic variance of the estimator $\hat \pi$ as follows:
\begin{aligned} \mathrm{Var}(\hat \pi) &= \mathrm{Var}\left( \frac{1}{\hat \pi} \right) \; \mathrm{E}({\hat \pi})^4 \hspace{35.2em} \\[1em] &= \frac{d^2}{\ell^2} \frac{1}{4N} p(1-p) \pi^4 \\[1em] &= \frac{d^2}{\ell^2} \frac{1}{4N} \frac{4 \ell^2}{d^2 \pi^2} \left( \frac{d \pi}{2 \ell} - 1 \right) \pi^4 \hspace{5em} \text{using} \hspace{1em} p(1-p) = p^2 \left( \frac{1}{p} - 1 \right) = \frac{4 \ell^2}{d^2 \pi^2} \left( \frac{d \pi}{2 \ell} - 1 \right) \\[1em] &= \frac{\pi^2}{N} \left( \frac{d}{\ell} \frac{\pi}{2} - 1 \right) \end{aligned}
Note that to minimise the variance (of both $\hat \pi$ and $\dfrac{1}{\hat \pi})$ requires minimising $\dfrac{d}{\ell}$ i.e. setting $d = \ell$ as $d$ cannot be shorter than $\ell$.
Let's create convenience functions of the estimated variance and standard deviation, and then test with simulation. For the minimum variance case of $d = \ell$ we'd expect the variance to be $\dfrac{\pi^2}{N} \left( \dfrac{\pi}{2} - 1 \right) \approx \dfrac{5.63}{N}$.
σ²(N, ℓ, d) = π^2/N * (d/ℓ * π/2 - 1)
σ(N, ℓ, d) = √σ²(N, d, ℓ);
For $N=10,000$ (i.e. $10,000$ needles) and $d = \ell$ the variance estimate is:
N, ℓ, d = 10_000, 1, 1
σ²(N, ℓ, d)
Running this simulation $5,000$ times and calculating the sample variance gives:
var([π_est(10_000, 1, 1) for _=1:5000])
- a pretty good match.
We can now say more about the expected accuracy of the estimate of $\pi$. We can expect to have an error of not more than two standard deviations away from the mean about $95\%$ of the time.
For $d = \ell$ and $N = 10,000$ this is:
2σ(10_000, 1, 1)
For $d = \ell$ and $N = 10,000,000$ this is:
2σ(10_000_000, 1, 1)
Now let's run the simulation again with this optimal setting of $d=\ell$.
π_hat = π_est(10_000_000, 1, 1)
println("π estimate : $(round(π_hat, 6))")
println("π actual : $(round(π, 6))")
println("absolute difference : $(round(π_hat - π, 6))")
we find the estimate is inside this error interval.
Turning this error estimate around we can see how slow the convergence is for a given confidence interval. If we want a $95 \%$ confidence that the error is not more than $\epsilon$ we should throw $\left( 2 \dfrac{\sqrt 5.63}{\epsilon} \right)^2$ needles. So if we want the error margin to be to 3 digits (i.e. $\epsilon = 0,001$) we'd need to drop over $22,000,000$ needles!
Now returning to the choice of $d$ (i.e. how far apart to draw the horizontal lines - given needle length $\ell$). As shown above, to minimise the variance we want $d = \ell$. We can demonstrate this with simulations and charts. We'll run $1,000$ trials of $10,000$ needles each for four parameter settings of $d$.
Each point in the chart below is an estimate of $\pi$ and the red lines show the $95%$ confidence interval estimated as above. As you can see, the case $d = \ell$ does indeed have the lowest variance.
using Gadfly # Plotting package
function plot_trials(;ℓ=1, d=1)
estimates = zeros(1000)
for k in 1:1000
estimates[k] = π_est(10_000, ℓ, d)
end
l1 = layer(yintercept=[π - 2σ(10_000, d, ℓ), π + 2σ(10_000, d, ℓ)], Geom.hline(color="red"))
l2 = layer(x=1:1000, y=estimates, Geom.point)
guides = (Coord.cartesian(ymin=2.8, ymax=3.5),
Guide.xticks(ticks=collect(200:200:1000)), Guide.yticks(ticks=[2.8, 2.9, 3.0, 3.1, 3.14, 3.2, 3.3, 3.4, 3.5]),
Guide.xlabel("trial"), Guide.ylabel("estimate"), Guide.title("ℓ = $ℓ, d = $d"))
plot(l1, l2, guides...)
end;
set_default_plot_size(24cm, 16cm)
vstack(hstack(plot_trials(ℓ=1, d=1), plot_trials(ℓ=1, d=2)), hstack(plot_trials(ℓ=1, d=4), plot_trials(ℓ=1, d=8)))